We will explore each one of the variables of the wine data set. Our main focus is the quality variables, as we want to have an idea of which components of the wine affect its quality.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
The white wines data set has 4898 observations and 13 variables ### What is/are the main feature(s) of interest in your dataset? The variable of interest is quality ### What other features in the dataset do you think will help support your investigation into your feature(s) of interest? All other variables are characteristics of the wine that should influence the quality ### Did you create any new variables from existing variables in the dataset? I just added the type in case I need to study the red wines ### Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this? Most of the variables follor a normal distribution. I didn’t do any operations
In this part we will go further by studying the relationship between each one of the variables. We will start with a matrix of the variables to have a quick idea of how they are related to each other.
## ww$alcohol.range: <8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 3.5 4.0 4.0 4.5 5.0
## ------------------------------------------------------------
## ww$alcohol.range: 8-9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 5.000 5.606 6.000 8.000
## ------------------------------------------------------------
## ww$alcohol.range: 9-10
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 5.000 5.488 6.000 8.000
## ------------------------------------------------------------
## ww$alcohol.range: 10-11
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.864 6.000 9.000
## ------------------------------------------------------------
## ww$alcohol.range: 11-12
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 6.000 6.000 6.191 7.000 8.000
## ------------------------------------------------------------
## ww$alcohol.range: 12-13
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 6.000 7.000 6.571 7.000 9.000
## ------------------------------------------------------------
## ww$alcohol.range: 13-14
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.00 6.00 7.00 6.72 7.00 8.00
## ------------------------------------------------------------
## ww$alcohol.range: >14
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7 7 7 7 7 7
## ww$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.587 4.600 6.393 10.700 16.200
## ------------------------------------------------------------
## ww$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.300 2.500 4.628 7.100 17.550
## ------------------------------------------------------------
## ww$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.800 7.000 7.335 11.500 23.500
## ------------------------------------------------------------
## ww$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.700 5.300 6.442 9.900 65.800
## ------------------------------------------------------------
## ww$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.700 3.650 5.186 7.325 19.250
## ------------------------------------------------------------
## ww$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.800 2.100 4.300 5.671 8.200 14.800
## ------------------------------------------------------------
## ww$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.60 2.00 2.20 4.12 4.20 10.60
## ww$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.870 3.035 3.215 3.188 3.325 3.550
## ------------------------------------------------------------
## ww$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.830 3.070 3.160 3.183 3.280 3.720
## ------------------------------------------------------------
## ww$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.790 3.080 3.160 3.169 3.240 3.790
## ------------------------------------------------------------
## ww$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.080 3.180 3.189 3.280 3.810
## ------------------------------------------------------------
## ww$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.840 3.100 3.200 3.214 3.320 3.820
## ------------------------------------------------------------
## ww$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.940 3.120 3.230 3.219 3.330 3.590
## ------------------------------------------------------------
## ww$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.200 3.280 3.280 3.308 3.370 3.410
There is not a clear relationship between the quality variable and any of the variables. Possibly the alochol level can be useful to secure a minimum level of alcohol. Low levels of alcohol are associated with lower quality wine. There are some other interesting ranges, such as pH and chlorides. And other recommendations such as the volatile acides should be below 0.6 and the free sulphur dioxide below 50 ppm.
There are some relationships between alcohol-density, residual sugar - density, and fixed acidity and pH.
The strongest relationship is between quality and alcohol level.
In this section we will set a combination of variables and how they are related to quality. We want to find a range for the components that affect the quality.
Alcohol is clearly the one that makes the most difference, but after that you can observe clear ranges in some variables that have a concentration of higher quality wines.
For instance, Alcohol and density not also have influence on the quality, but also on each other. A higher density means a lower alcohol level.
Just a linear regression between quality and alcohol level. It is the most influential variable, however you can use alcohol levels of 20, and the quality would be better than one with alcohol level of 14. This in reality is not true. Also the R^2 is very low.
Alcohol level has the major influence on quality. We can see that lower quality wines are mainly below 12% alcohol.
Density also affects the quality. Density and alcohol are negatively correlated. It could be seen that for density levels below 0.9925, the quality improves. There are some other variables that have smaller influence, such as pH, chlorides, free sulfur dioxide and total sulfur dioxide. The best ranges for each of the variables are marked on the graph.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.000 6.000 7.000 6.652 7.000 8.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
I created a subset to select the best wines from the list according to the ranges found on the previous graphs. This selection includes 164 wines, the quality median is 7 (vs 6 of the whole data set), the first and third quantile also improve by one point.
There is not a clear way to determine the quality of a wine with all the variables. You can find some ranges for some of the variables but that doesn’t assure the quality will be the best.
It could be more conclusive to determine for example ranges where you can find a majority of low quality wines. For instance, alcohol levels below 10%.
Additionally, quality can be a subjective variable depending on who is the one grading the wines. This makes it more difficult to reach a conclusion of which are the optimum parameters for the best wine.